forked from huggingface/optimum
-
Notifications
You must be signed in to change notification settings - Fork 0
Fix optimum compat #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Conflicts: # optimum/gptq/quantizer.py
…t is not associated with a value
jiqing-feng
added a commit
that referenced
this pull request
Dec 23, 2024
* align gptq check to transformers for supporting cpu * fix comment * gptqmodel Signed-off-by: jiqing-feng <[email protected]> * compatible with auto-gptq Signed-off-by: jiqing-feng <[email protected]> * fix compatible with auto-gptq Signed-off-by: jiqing-feng <[email protected]> * fix compatible with auto-gptq linear Signed-off-by: jiqing-feng <[email protected]> * revert unrelated changes Signed-off-by: jiqing-feng <[email protected]> * gptqmodel need use checkpoint_format (#1) * need checkpoint_format * default value of checkpoint_format is gptq * fix quantize * fix quantize * fix quantize * Update quantizer.py * need convert to v1 before gptqmodel save * back checkpoint_format to gptq after convert * cleanup code * sym=False is not supported with auto-gptq * add comments * cleanup code * Update quantizer.py * always convert v2 to v1 if checkpoint_format = "gptq" * Update quantizer.py --------- Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> * Mod backend code (#2) * keep gptq_v2 if sym is false * use hf_convert_gptq_v1_to_v2_format, hf_convert_gptq_v2_to_v1_format, and hf_gptqmodel_post_init * no need check backend * use device_map * cleanup * Update quantizer.py * move import --------- Co-authored-by: Qubitium-ModelCloud <[email protected]> * fix format and log Signed-off-by: jiqing-feng <[email protected]> * fix version check Signed-off-by: jiqing-feng <[email protected]> * enable gptqmodel tests Signed-off-by: jiqing-feng <[email protected]> * update check quant type Signed-off-by: jiqing-feng <[email protected]> * Fix optimum compat (#3) * add meta info * cleanup * cleanup * The value of quantizer should be an array * Update quantizer.py * If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer" * If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer" * Update quantizer.py * cleanup * comment on meta * hf_select_quant_linear pass checkpoint_format * add todo fix * move convert code to quantizer.save() * Update quantizer.py * Optimize hf_convert_gptq_v2_to_v1_format() * Optimize hf_convert_gptq_v1_to_v2_format() * fix GPTQTestCUDA * hf_select_quant_linear() always set pack=True * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * GPTQQuantizer add backend * lower checkpoint_format and backend * cleanup * move backend to bottom * no need to check gptqmodel version for ipex support * Update import_utils.py * Update quantizer.py * fix UnboundLocalError: cannot access local variable 'version' where it is not associated with a value * make version var short * Update import_utils.py * fix unittest * use assertLessEqual --------- Co-authored-by: Qubitium-ModelCloud <[email protected]> Co-authored-by: LRL <[email protected]> * fix format and convert v2 to v1 Signed-off-by: jiqing-feng <[email protected]> * [Fix] all tensors not same device (#5) * fix device error * update gptqmodel version * fix test * fix format Signed-off-by: jiqing-feng <[email protected]> * add gptqmodel tests which contains cpu Signed-off-by: jiqing-feng <[email protected]> * fix all auto-gptq tests Signed-off-by: jiqing-feng <[email protected]> * revert tests Signed-off-by: jiqing-feng <[email protected]> * rm gptqmodel yaml Signed-off-by: jiqing-feng <[email protected]> * fix comment Signed-off-by: jiqing-feng <[email protected]> * enable real cpu tests by fp32 Signed-off-by: jiqing-feng <[email protected]> * fix test model name Signed-off-by: jiqing-feng <[email protected]> * keep the original device setting when using auto-gptq Signed-off-by: jiqing-feng <[email protected]> * Update optimum/gptq/quantizer.py Co-authored-by: Ilyas Moutawwakil <[email protected]> * Update optimum/gptq/quantizer.py Co-authored-by: Ilyas Moutawwakil <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> Co-authored-by: LRL-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: LRL <[email protected]> Co-authored-by: Ilyas Moutawwakil <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.